52 research outputs found
Recommended from our members
Large-scale evaluation of automated clinical note de-identification and its impact on information extraction
Objective: (1) To evaluate a state-of-the-art natural language processing (NLP)-based approach to automatically de-identify a large set of diverse clinical notes. (2) To measure the impact of de-identification on the performance of information extraction algorithms on the de-identified documents. Material and methods A cross-sectional study that included 3503 stratified, randomly selected clinical notes (over 22 note types) from five million documents produced at one of the largest US pediatric hospitals. Sensitivity, precision, F value of two automated de-identification systems for removing all 18 HIPAA-defined protected health information elements were computed. Performance was assessed against a manually generated ‘gold standard’. Statistical significance was tested. The automated de-identification performance was also compared with that of two humans on a 10% subsample of the gold standard. The effect of de-identification on the performance of subsequent medication extraction was measured. Results: The gold standard included 30 815 protected health information elements and more than one million tokens. The most accurate NLP method had 91.92% sensitivity (R) and 95.08% precision (P) overall. The performance of the system was indistinguishable from that of human annotators (annotators' performance was 92.15%(R)/93.95%(P) and 94.55%(R)/88.45%(P) overall while the best system obtained 92.91%(R)/95.73%(P) on same text). The impact of automated de-identification was minimal on the utility of the narrative notes for subsequent information extraction as measured by the sensitivity and precision of medication name extraction. Discussion and conclusion NLP-based de-identification shows excellent performance that rivals the performance of human annotators. Furthermore, unlike manual de-identification, the automated approach scales up to millions of documents quickly and inexpensively
Exploitation de corpus parallèles et comparables pour la détection de correspondances lexicales (application au domaine médical)
Dans ce travail, nous cherchons à mettre des propriétés des corpus textuels (parallélisme et comparabilité) à profit pour l'Informatique Médicale, en détectant des correspondances lexicales de deux types: des traductions de termes médicaux afin d'enrichir des terminologies; des paraphrases d'expressions spécialisées et grand public dans le but d'aider à rédiger des documents grand public. Une première expérience se base sur des approches éprouvées et un corpus parallèle, et met en place des méthodes d'alignement de corpus. Ceci nous a permis d'obtenir de nouvelles traductions françaises de termes anglais, dont certaines sont maintenant intégrées au thésaurus MeSH. Une seconde expérience examine les possibilités d'exploitation de corpus comparables monolingues. Deux méthodes ont été conçues: une première recherche des paraphrases de nominalisations; la deuxième des paraphrases de composés savants. Diverses paraphrases semblant cohérentes avec l'opposition spécialisé/grand public étudiée ont été obtenues.PARIS-BIUSJ-Mathématiques rech (751052111) / SudocSudocFranceF
Text-mining tools for extracting information about microbial biodiversity in food
Information on food microbial diversity is scattered across millions of scientific papers. Researchers need tools to assist their bibliographic search in such large collections. Text mining and knowledge engineering methods are usefu l to automatically and efficiently find relevant information in Life Science. This work describes how the Alvis text mining platform has been applied to a large collection of PubMed abstracts of scientific papers in the food microbiology domain. The information targeted by our work is microorganisms, their habitats and phenotypes. Two knowledge resources, the NCBI taxonomy and the OntoBiotope ontology were used to detect this information in texts. The result of the text mining process was indexed and is presented through the AlvisIR Food on-line semantic search engine. In this paper, we also show through two illustrative examples the great potential of this new tool to assist in studies on ecological diversity and the origin of microbial presence in food
Participation de l’équipe LAI à DEFT 2019
International audienceNous présentons dans cet article les méthodes conçues et les résultats obtenus lors de notre participation à la tâche 3 de la campagne d'évaluation DEFT 2019. Nous avons utilisé des approches simples à base de règles ou d'apprentissage automatique, et si nos résultats sont très bons sur les informations simples à extraire comme l'âge et le sexe du patient, ils restent mitigés sur les tâches plus difficiles. ABSTRACT Participation of team LAI in the DEFT 2019 challenge We present in this article the methods developed and the results obtained during our participation in task 3 of the DEFT 2019 evaluation campaign. We used simple rule-based or machine-learning approaches ; our results are very good on the information that is simple to extract (age, gender), they remain mixed on the more difficult tasks
Defining Medical Words : Transposing Morphosemantic Analysis from French to English
MEDINFOInternational audienceMedical language, as many technical languages, is rich with morphologically complex words, many of which take their roots in Greek and Latin—in which case they are called neo-classical compounds . Morphosemantic analysis can help generate definitions of such words. This paper reports work on the adaptation of a morphosemantic analyzer dedicated to French (DériF) to analyze English medical neoclassical com-pounds. It presents the principles of this transposition and its current performance. The analyzer was tested on a set of 1,299 compounds extracted from the WHO-ART terminology. 859 could be decomposed and defined, 675 of which success-fully. An advantage of this process is that complex linguistic analyses designed for French could be successfully trans-ferred to the analysis of English medical neoclassical com-pounds. Moreover, the resulting system can produce more complete analyses of English medical compounds than exist-ing ones, including a hierarchical decomposition and seman-tic gloss of each wor
Morphosemantic parsing of medical compound words: Transferring a French analyzer to English
International audienceMedical language, as many technical languages, is rich with morphologically complex words, many of which take their roots in Greek and Latin - in which case they are called neoclassical compounds. Morphosemantic analysis can help generate definitions of such words. The similarity of structure of those compounds in several Europeanlanguages has also been observed, which seems to indicate that a same linguistic analisys could be applied to neo-classical compounds from different languages with minor modifications. This paper reports work on the adaptation of a morphosemantic analyser dedicated to French (DériF) to analyse English neo-classical compounds. It presents the principles of this transposition and its current performance
Text mining tools for extracting information about microbial biodiversity in food
Introduction Information on food microbial biodiversity is scattered across millions of scientific papers (2 million references in the PubMed bibliographic database in 2017). It is impossible to manually achieve an exhaustive analysis of these documents. Text-mining and knowledge engineering methods can assist the researcher in finding relevant information. Material & MethodsWe propose to study bacterial biodiversity using text-mining tools from the Alvis platform. First, we analyzed terms that designate Microbial and Habitat entities in text. Microorganism names were predicted using the NCBI taxonomy. Habitat entities were detected using the syntactic structure of the terms and the OntoBiotope ontology. This ontology has been specifically enriched for the recognition of food terms in text. In a second time, we predicted links between microorganisms and their habitats (labeled “Lives_in” relationships) using pattern and machine-learning based methods. The results of text-mining predictions are indexed and presented in a semantic search engine. Result The AlvisIR search engine for microbe literature gives online access to 1.2 million PubMed abstracts in 2015, among which 13% are specific to food. This tool makes it possible to use text-mining results to search for information on bacterial biodiversity. It covers all types of microbial habitats to help understand the origin of microbial presence in food. Significance This work presents the first semantic search engine dedicated to better understand microbial food biodiversity from text
Text-mining needs of the food microbiology research community
To ensure the usefulness of a bioinformatics service, analysis of user needs is an essential step. Furthermore, if the service anticipates the identified needs, acceptance by the user is easier. The aim of this work is to provide an overview of the requirements of a microbial diversity research community for ontology-based text-mining applications.This study is part of the development of the European infrastructure for text-mining, OpenMinTeD, that targets Biodiversity among other research fields. The requirement analysis was completed through targeted online surveys, interviews, focus group meetings and workshops. This work yields to a detailed up-to-date landscape of stakeholders (data provider, producer and consumer), their potential role and their expectations of general interest with respect to text-mining applications. We introduce a user-centered approach to focus on microbiologist end-user functional requirements, including application user interfaces. The resulting description of these needs guides OpenMinTeD current development to design and develop activities within text-mining projects for microbiology community
Text-mining tools for extracting information about microbial biodiversity in food
Article in pressInternational audienceInformation on food microbial diversity is scattered across millions of scientific papers. Researchers need tools to assist their bibliographic search in such large collections. Text mining and knowledge engineering methods are usefu l to automatically and efficiently find relevant information in Life Science. This work describes how the Alvis text mining platform has been applied to a large collection of PubMed abstracts of scientific papers in the food microbiology domain. The information targeted by our work is microorganisms, their habitats and phenotypes. Two knowledge resources, the NCBI taxonomy and the OntoBiotope ontology were used to detect this information in texts. The result of the text mining process was indexed and is presented through the AlvisIR Food on-line semantic search engine. In this paper, we also show through two illustrative examples the great potential of this new tool to assist in studies on ecological diversity and the origin of microbial presence in food
Text-mining and ontologies: new approaches to knowledge discovery of microbial diversity
International audienceMicrobiology research has access to a very large amount of public information on the habitats of microorganisms. Many areas of microbiology research uses this information, primarily in biodiversity studies. However the habitat information is expressed in unstructured natural language form, which hinders its exploitation at large-scale. It is very common for similar habitats to be described by different terms, which makes them hard to compare automatically, e.g. intestine and gut. The use of a common reference to standardize these habitat descriptions as claimed by (Ivana et al., 2010) is a necessity. We propose the ontology called OntoBiotope that we have been developing since 2010. The OntoBiotope ontology is in a formal machinereadable representation that enables indexing of information as well as conceptualization and reasoning
- …